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Reinforced concrete (RC) shear wall is one of the most 
widely adopted  earthquake-resisting structural elements. 
Accurate prediction of capacity curves of RC shear walls has 
been of significant importance since it can convey important 
information about progressive damage states, the degree of 
energy absorption, and the maximum strength. Decades-long 
experimental efforts of the research community established a 
systematic database of capacity curves, but it is still in its 
infancy to productively utilize the accumulated data. In the 
hope of adding a new dimension to earthquake engineering, 
this study provides a machine learning (ML) approach to 
predict capacity curves of the RC shear wall based on a 
multi-target prediction model and _ fundamental _ statistics. 
This paper harnesses bootstrapping for uncertainty 
quantification and affirms the robustness of the proposed 
method against erroneous data. Results and validations using 
more than 200 rectangular RC shear walls show a promising 
performance and suggest future research directions toward 
data- and ML-driven earthquake engineering. 
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1. Introduction 


In the past decades, persistent efforts have been devoted to gaining insights into the nonlinear 
behaviors of damaged rectangular RC walls [1-4]. Driven by these accomplishments, the 
research community benefits from databases (e.g., ACI 445B Shear Wall Database, Peer 
Structural Performance Databases, and DesignSafe Platform). Many ML-based predictions of 
RC structures have been on trial [5,6]. ML gives computers the ability to learn complex data 
without explicitly programmed rules. ML can be categorized into single-target prediction and 
multiple-target prediction methods regarding the number of prediction targets. There exist 
various applications of single-target ML methods in infrastructure engineering. The prediction of 
the shear strength of a deep beam was conducted by support vector regression (SVR). The 
researchers modified the SVR algorithm to optimize hyperparameters to be more suitable for 
civil applications [7,8]. Valdebenito [9] estimated the in-plane shear strength of reinforced 
masonry (RM) using the artificial neural network (ANN). ANN model was trained and tested by 
285 RM walls from pieces of literature. The compressive strength of high-performance concrete 
had been predicted using the ensemble method [10]. Furthermore, with the interest of vertical 
structural elements, prediction of horizontal forces was made via support vector machine and 
ANN [5,11]. 


However, ML-based prediction of force-displacement (F-D) capacity curves is challenging since 
it involves multiple-target predictions. Two rare examples of curve prediction include predictions 
of soil-water characteristic curves (SWCCs) using genetic programming (GP) [12] and ANN 
[13], respectively. In Johari’s work, SWCC itself was learned and predicted by the GP, but the 
final prediction is a complex mathematical expression of the curve. Sajib developed ANN 
models of the SWCC fitting parameters to predict the suction-water content relationship. 


In this paper, we adopt a multi-target regression model (MTRM) to predict the capacity curves of 
RC shear walls. This paper is structured as follows. The second section demonstrates the 
methodology of the MTRM and its extension with ensemble learning. The third section presents 
complete procedures to build the capacity curve database and perform capacity curve prediction. 
The fourth section summarizes predictive results, validation, and impact of the extended database 
and erroneous data on the proposed method. Finally, the last section yields the conclusion and 
discusses the limitations and future extensions. 


2. Multi-target regression model 


MTRM has been implemented in the open-source machine learning system (named Clus) 
developed by Struyf [14]. Clus is a decision tree learner and rule learning system that works in 
the predictive clustering trees (PCTs) [14]. Prior to the demonstration of MTRM, it is instructive 
to introduce the background of ML. There are two categories of ML methods depending on 
training data. The first category is “supervised” learning, in which ML trains with data consisting 
of a pair of {x, yO} that stands for a vector of descriptive variables and yO € R* represents a 
target vector. The superscript (1) indicates labeled data. Contrarily, “unsupervised” learning 
trains ML with unlabeled data consisting of {x} where (wu) indicates unlabeled data. A 
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decision tree, a typical supervised learning method, is a tree-shaped graph that uses a branching 
method to demonstrate every possible outcome of a decision. It is widely used in data mining to 
simplify complex problems. It usually starts with a single node, which branches into all possible 
outcomes. 


for V instances, 


ConvertPCTsToRules 
(x; >2;x2,y)? [as a oa 


Ifx, > 3, 
then y = 3 


Cluster 2 Cluster 3 


(a) (b) 


Fig. 1. (a) Illustrative example of a PCT; (b) example of a rule ensemble. 


Each of those outcomes will branch into other nodes, which represent other possibilities. 
Clustering, a representative unsupervised learning method, tries to find a collection of points that 
are similar to each other in terms of homogeneous values of all variables compared with points 
out of the cluster. Decision tree and clustering are therefore considered as quite different 
methods. Decision trees partition instances to subsets in terms of values of target attributes only, 
and clustering splits instances to subclusters regarding the value of all descriptive attributes. 
Noteworthy, a PCT is a decision tree whose leaves do not contain classes, and each node, as well 
as each leaf, corresponds to a cluster in Fig. 1 (a) with instances in the form of {x,, x2, y}. 
Diversely, PCTs search for subsets with the values of both descriptive attributes and target 
attributes [15]. MTRM shares the same algorithm with PCTs in the context of constructing 
clusters. PCTs can be built with a standard “top-down induction of decision trees” (TDIDT) 
algorithm [16]. Top-down PCTs shape in a triangle whose root is up. All instances locate at the 
root at the beginning, and they are partitioned into subclusters by tests. The pseudo algorithm of 
constructing PCTs is presented in Table 1 [17]. 


It is instructive to recap key strategies of PCTs, i.e., a splitting criterion, a stopping criterion, and 
a pruning strategy, respectively. There are many splitting criteria (e.g., Shannon entropy [18] and 
Gain Ratio [19]). The purpose of splitting clustering is to obtain subclusters such that intra- 
cluster distance (the distance between examples belonging to different clusters) is minimized. 
For regression problems, intra-cluster distance is specified as the intra-cluster variance. Given a 
cluster and a test that will result in a partition of the cluster to decrease the variance, the intra- 
cluster variance is defined as: 


Var = dak)? (1) 
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where x € R” is the mean vector of the cluster, and x; € R” (i = 1,::-,N) is an element in the 
cluster, and N is the total number of elements in the cluster. The entity d stands for the Euclidean 
distance. Growing trees without stopping criteria will lead to an overfitting problem. Often, a test 
is applied to check whether the class distribution in the sub-clusters differs significantly. Since 
the regression problem uses intra-cluster variance as the heuristic for choosing the best split, then 
a reasonable stopping criterion is to use an F-test to check whether variance decreased 
significantly, and thus a test will be found. 


Table 1 

Algorithm of constructing PCTs. 

1: Function PCT (Training instances /): 9: Function BT (J): 

2:(t*,p*) = BT); 10: p = partition induced on / by t ; 
3: Ift* # none 11: (t*, p*, h*) = (None, 0.5, 0) ; 

4: foreach I, € P* 12:h = var(1) — Yi, ep Hvar (Ik) : 
5: Tree, = PCT(,); 13: for each test 

6: return node(t,;, Tree,) ; 14: if(h > h*) 

7: else if 15: (t*, p*, h*) = (t,p,h); 
8: return leaf Uprotoype) ; 16: return (t*, p*) ; 


If no acceptable test is found, the algorithm labels the leaf with the prototype instances and stops 
the growth. Pruning strategy is a technique to remove trivial parts of the tree to identify 
instances. Often pruning is done randomly for large data. This paper does not adopt any pruning 
strategies due to our small database size. The illustration of the pseudo-algorithm of constructing 
PCTs will help engineers with a comprehensive understanding of the MTRM. The PCT function 
takes instances J as input to grow trees. An instance represents a row of the dataset in this paper. 
The function PCT in line 1 of Table 1 is the algorithm's main function, which grows the decision 
tree until stopping criteria are met. The function BT is invoked in line 2 to search for the best test 
to partition training instances to hierarchical clusters. BT returns optimal t and p, denoted as 
(t*, p*), where t is an action test of attribute values to induce a partition on J, p is a partition 
induced on / by t (e.g., In Fig. 1 (a), a test t on root node checks whether x, is larger than two or 
not to partition J at the root to two sub-clusters via a partition p). The superscript ““*” represents 
the optimal (i.e., best-so-far) quantities. With BT in line 2, PCT function is invoked recursively 
to obtain trees and the corresponding nodes within the loop in lines 5 and 6. However, if the best 
test is not found in line 7, then the algorithm will return a leaf labeled as the prototype instances 
in line 8. Usually, the prototype instances have the lowest average distance to all other instances 
in the cluster, such as the mean of the original instances. 


Function BT is explained in the right column of Table 1. BT searches for the best test to partition 
the cluster to minimize intra-cluster variance (i.e., maximize inter-cluster variance). In line 11, 
the candidates for the best test (t*) along with the corresponding partition (p*) and heuristic 
value (h*) are initialized. Here, h is defined in line 12, meaning a heuristic value of t. Function 
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var is defined in Eq. (1). Since t* is initially unknown, h* is set as zero. The loop in line 13 
calculates heuristic values of all possible tests to partition clusters. The best test and partition will 
be chosen if a current heuristic value h is larger than the initial heuristic value h* (lines 14-15). 


2.1. Ensemble method 


An ensemble method has been used to boost the prediction accuracy of this study. This method 
generates an ensemble of prediction models since combining a number of predictions is often 
more accurate than an individual prediction model [20,21]. 


Table 2 
Pseudo code of ensemble method. 


1: Let M = the original training data; n,,, = number of prediction models; X = the test data 
2: fori = 1to nym do 

3 Create an identical training set M; from M 
4 Build a prediction model PM, with M,; 

5: end 

6: for each test record xj E X, j =1,...,n do 

7 
8 


m 
Pm PM (Xj) 


x 
PMfinai(xj) = “= 


: end 


Npm 


The general procedures for the ensemble method are summarized in Table 2. In line 3 of Table 2, 

the main loop creates Np, sets of training data Mi, ..., Mio by the simple random sampling 

method. It is a naive sampling method that generates every possible sample Mj of size a from 
pm 

the population of size M [22]. Each instance has an equal probability of being selected. Line 4 

utilizes sets of training data to train Ny, base prediction models PM), ..., PM ym: Then line 7 


aggregates predictions of all the models and algebraically averages these predictions as the final 
output for the regression problem. Various approaches have been successfully applied to 
construct ensemble learning. The popular ones are bootstrap aggregation (so-called bagging), 
boosting, and random forests. Bagging, a technique to generate multiple repeated bootstrap 
samples with replacement, is frequently used in classification and regression to improve stability 
and accuracy [23]. Instead of generating a succession of independent bootstrap samples, boosting 
trains multiple base prediction models using a weighted data set. Weights of samples are adjusted 
by issuing more weights on misclassified samples [24]. In this paper, random forests are 
implemented according to the research conclusion by Dragi, which indicates that multi-objective 
random forests are significantly better than multi-objective bagging [25]. Random forests share 
the same general procedures with other ensemble methods in Table 2. The general procedures to 
build random forests are shown as follows: 


1. Subsets training data M to i bootstrap samples M,, ...,M; in line 3 of Table 2. 


2. Build i decision trees DT;, ..., DT; with corresponding M; as suggested in line 4. At each node, 
variables are selected at random out of all the features, and the best splits on these variables are 
used to split the node. Each tree is growing to the largest extent without pruning. 
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3. Perform prediction with test data using each tree DT; in line 7. The final prediction will be the 
average of PM,(x;), PMz(x;), ... PM,(x;) because it is a regression problem (PM;(x;) is the 
prediction from decision tree DT;). 


In this paper, random forests have been employed as an ensemble learning method to cooperate 
with MTRM. Random forest cooperates with MTRM mainly in terms of two aspects. Firstly 
MTRM generates a collection of PCTs by bagging random forests instead of a single decision 
tree. Secondly, MTRM randomly picks attributes as input for function BT in Table 1 instead of 
using all attributes to find out the best test to partition the cluster. 


Table 3 
Algorithm of constructing rule ensembles. Note that / is training instances, and T is a collection of PCTs. 
Rand W represent the collection of rules generated from T and their corresponding weights. 


1: GenerateSetOfPCTs(1): 5: OptimizeWeights(R, 1): 

2: return T; 6: If (weight ofr € R = 0) 
3: ConvertPCTsToRules(T): 7: remove r; 
4: return R; 8: return (R, W); 


2.2. Rule ensemble for MTRM 


Large ensembles of PCTs are hard to interpret. Thus, all PCTs are transcribed into a collection of 
rules. Rule learning, a collection of unordered rules whose predictions are combined via 
weighted voting, is an expressive and human-readable model representation. It is a conjunction 
of statements along with input variables. To briefly explain how the rule ensemble interprets the 
MTR\M, the key algorithm to achieve rule ensembles of MTRM is summarized in Table 3 [26]. 
In line 1 of Table 3, function GenerateSetOfPCTs recursively calls function PCT in Table 1 to 
generate bagging of PCTs, then line 2 returns a collection of PCTs. Such large ensembles of 
PCTs are impossible to interpret, and thus all the trees are transcribed to sets of rules by function 
ConvertPCTsToRules in line 3 [27]. Line 5 finds the optimized weight for each of those rules R 
by function OptimizeWeights. During this process, it is trying to assign as many weights as 
possible to zero, in the purpose of learning small and interpretable trees. A gradient-directed 
optimization algorithm [26] optimizes all the weights. The physical meaning of weights indicates 
the importance of each rule contributing to the final prediction. Lines 6 and 7 remove the trees if 
their optimal weights are zero. Finally, line 8 returns a collection of rules whose weights are not 
zero and their corresponding weights. Hence, the final prediction can be computed by the 
following equation: 


J = woavg + pe wir; (x) (2) 


where Wg is the baseline prediction, part (avg) is a constant vector with the average over all the 
targets. The entity 7; is a vector function which gives out a constant prediction shown in Fig. 1 
(b) as a toy example. And w; is the corresponding weight of a rule. Note that M indicates the 
number of rules in a PCT. Fig. 1 (a) considers a population of instances with two descriptive 
variables in the form of {x,,x2} and a target response {y}. A toy PCT is constructed on top of 


96 Y. Yang, I.H. Cho/ Journal of Soft Computing in Civil Engineering 5-4 (2021) 90-113 


founded tests, and each clustering of the PCT is represented by a conditional statement as a result 
of function ConvertPCTsToRules in Fig. 1 (b). A prediction of y with {x,,x2} = {5, 0.1}is 
calculated as: 


¥ = 0.95(1) + 0.2 [if(x, > 4), then (1)] + 0.4 [if(x, > 3), then (3)] + 0 [if(x, < 2), 
then (2)] + 0.3 [if(x2 < 1), then (1)] + 0 [if(x > 2.5), then (6)] 


=0.95+0.2x1+04x3+0x2+03x1+0x6=2.65 (3) 


Conditions in the statements only take descriptive attributes into account because the rules will 
be applied to the new unlabeled instances. In this paper, there are eight target variables, and thus 
each rule will give a resultant vector of dimension eight. The adopted MTRM is PCTs employing 
random forests, and the model is transcribed into a rule ensemble for better interpretation, 
enabling the proposed model to predict multiple targets simultaneously. 


2.3. Clus 


MTRM has been implemented in the Clus, an open-source machine learning software that can be 
downloaded from [28]. Clus is a decision tree and rule learning system that works in PCTs [14]. 
It is a Java-based platform to build both classification and regression trees by choosing different 
operation settings. It has been successfully applied to plenty of tasks, including multi-target 
regression and classification, structured output learning, time series prediction, etc [14]. Clus 
provides many choices for operation settings. In particular, the operation settings related to the 
multiple-target regression are explained. First, three input files are required: (1) a file with 
training data, (2) a file with test data, and (3) a file specifying all the parameter settings. The 
training and test data dictionary (i.e., files names and variable types) should be listed in these 
setting files. Descriptive and target attributes in the dataset should be specified explicitly. Other 
functionalities, including choices of ensemble method and rule ensemble, should be addressed 
accordingly. Appendix A presents a brief example of input files. Full practical example files are 
available in [29]. After training the model, an output file will be generated which contains 
predictions for multiple target attributes. In addition, one can access the graphic PCTs in the 
output file of which example is shown in Appendix B. One is referred to the Clus manual for 
detailed instructions and additional settings. 


3. Prediction of capacity curve 


Although the proposed ML-based approach to capacity curve prediction can be applied to any 
RC structure, this study demonstrates the potential by focusing on rectangular RC shear walls’ 
capacity curves. The training database is built upon a hybrid database consisting of real 
experimental results and computational simulation results. A high-prediction parallel finite 
element analysis platform (called VEEL, meaning virtual earthquake engineering laboratory) has 
been adopted to ensure reliably simulated curves. VEEL’s general applicability and accuracy 
have been well documented in [30]. VEEL is rooted in a number of microphysical mechanisms, 
including a multi-directional smeared crack model, a topological information-based steel bar 
model capable of capturing progressive bar buckling, a 3D interlocking-based nonlinear shear 
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mechanism, and a bar-concrete proximity-based general confinement model. An optimized 
parallel computing algorithm is leveraged to effectively link millimeter length-scale mechanisms 
to real-scale RC walls [31,32]. 


3.1. Transform capacity curve into multivariate targets 


The size of the experiment-based database is too small for ML training. We need to enrich the 
experimental database with simulated data without introducing a substantial loss of accuracy. 
The original database contains global F-D responses of seven rectangular shear walls (i.e., RW1, 
WSH1, WSH2, WSH3, WSH4, WSHS5, and WSH6). The contrast between experimental F-Ds 
from existing literature [2,33] to F-Ds simulated by VEEL is performed in Fig. 2 to emphasize 
the precision of the original database. As summarized in Table 4, the variances occur in the axial 
force ratio (a) in percentage, yield stress (f,) in MPa, the diameter of vertical reinforcement (dj) 
in millimeter, and concrete compressive strength (f’.) in MPa. It is challenging to rephrase the 
continuous capacity curve into the multivariate target, which machine learning can learn and 
predict. The overall procedures to extract the F-D capacity curve database are illustrated in Fig. 
3. In Task 1 of Fig. 3, it is essential to extract the outermost points. Most of the outermost points 
are related to the overall envelope of the capacity curve of a shear wall subjected to reverse and 
cyclic loading. Although there is no strict restriction, 46 points are extracted from the shear wall 
database, as visualized in Fig. 4. More points will improve the accuracy of the fitted capacity 
curve, but this choice appears acceptable to capture the overall nonlinear envelops reasonably. 
The extracted points on the capacity curve envelope are denoted as {d;, F;},i = 1, ...46, where d; 
is a displacement and F; is the associated force point. We perform separate least-square fittings 
on the positive and negative regimes to account for asymmetric shapes of general capacity curve 
envelopes. £€R? stands for parameters to be determined, and B = [Bp:By], Bp = 
{Py, Po, ..., Py}? and By = {Ny, No, ...,Ny}". Then, the optimal parameters (denoted by B) for the 
positive and negative regimes are obtained by 


Bp = argmin||F — dp,,| : for d; € Rt (4) 
P 

By = arerntalle — dB,,||?,, for dj € R™ (5) 
N 


where d is the model matrix, d € R*°** of which ij, row means d; = {d,,d?,d?,d}}. The 
envelope force vector is F = {F,, Fo, ..., Fy¢}. Thus, the p-parameter fitted model for the capacity 
curve envelop is succinctly given by: 


F, = H(d) DP, Pid! + H(—d)) DP, Nid} (6) 


where H(d) is the unit step function (i.e., one for d > 0, zero otherwise); p is the highest order of 
base polynomials. This study chose p=4 for the polynomial bases rooted in the prior 
knowledge that most capacity curves often exhibit convex or concave shapes. A higher-order 
fitting may help, but our choice is justifiable since the values of R’ (the coefficient of 
determination) calculated using our approach are commonly larger than 0.99. For the subsequent 
multi-target machine learning, we added the optimal parameters 
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B = [Bo: Bn] = (Pi, Po, Ps, Py, Ny,No,N3,N,}" onto the existing wall database. Thus, 32 
descriptive variables and eight target variables are included in the finalized database. Detailed 
variable information is summarized in Appendix C. Overall, the capacity curve database 
dimension is 182 xX 40 (.e., 182 instances with 40 attributes). 


Table 4 
Details of the original rectangular shear wall database. 
RW1 WSHI WSH2 WSH3 WSH4 WSHS5 WSH6 
ay 0~30 0~40 0~40 0~40 0~40 0~40 0~40 
fr 300~600 450~610 500~710 500~720 500~640 500~710 500 ~ 650 
dy, 12.7~28.6 8~ 14 8~ 15 8~15 8~ 15 6~12 8~ 15 
“6 37.7 30 ~ 60 30 ~ 60 30 ~ 60 30 ~ 60 30 ~ 60 30 ~ 60 
7 215 1 oF ls 1 1s 2 b 215 1 of" "Gs 1 1s 2 . 215 1 os" !%s 1 1s 2 
enn pe ee Co eT ee ES et Soo = ‘ oorrr om ; 
WSHI WSH2 WSH3 Reena 
_ 400 eo 400 1 400 ‘ZA 
Zz 5 be F 
= 200 if 200 200 Uf 
. 0 rer] f 2 re) o U6 4 4 pos o | ys spas 
3 -200 Uf, -200 -200 
3 F se fof / Ast 
-400 « = Web bar fracture -400 ° « = Web bar fracture -400 ss t ~~. 
« = Boundary ber fracture } « = Boundary ber fracture = Boundary ber fractere 
-600 fiaaanideaaacn da snenRiasacsdiactannad -600F Snr eee: ve ceacisinas -600 , oe 
-100 -75 -50 -25 0 25 50 75 100 -100 -75 -50 -25 0 25 50 75 100 -100 -75 -50 -25 0 25 50 75 100 
d sca thew. A OS. OS IRS. 1S copie he Be OF: OS. Be Ne f og tes: A. OS. OF ELS 2 
600" wsHa — 600/"  WSHS 600" WSH6 AAT 
_ 400+ - a 4 400+ 400 Wi4 
Zz t AE if E 
= 200+ E/ 200 | 200 
2g. 
= orn 4 4 1,6 | rr rs od “a 56 
= -200 -200 | -200 
< i Y f £4, f / 
—400 + ELS & = Compression zone ~400 © FSS | «= webterfrecme —400 | SEG] ©. = Contimemenn tie 
t crushing iq © = Boundary ber fracture | LAS fracture 
-600f 1 Inala J -600 F ; ; j -600 ae ew. 
-100-75 -50 -25 0O 25 50 75 100 -100-75 -5S0 -25 0O 25 50 75 100 -100-75 -50 -25 0O 25 50 75 100 
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Fig. 2. (Top six panels) experimental F-D responses versus (bottom six panels) simulated F-D responses 
by VEEL. 
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Task 1: Extract 46 Task 2: Least square Task 3: Store the fitted 
envelope points of fitting of capacity curve coefficients of polynomial 


capacity curves from envelopes using bases into hybrid database 
hybrid database polynomial bases as target variables 


Fig. 2. Flowchart of transformation of capacity curve database to multiple target database. 


500 
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un 
o) 


Force [KN] 
oO 


NO 
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-100 -50 0) 50 100 
Displacement [mm] 
Fig. 3. Example of extraction of 46 outermost points from force-displacement (F-D) responses. 


3.2. Multi-target prediction of capacity curve 


This section explains the complete process of the multiple target ML prediction of the capacity 
curves using PCTs. PCTs consider trees as a hierarchy of clusters with respect to many observed 
descriptive variables to build trees to predict multiple targets simultaneously. As explained in the 
previous section, our hybrid database contains 32 descriptive variables (denoted as X € R”*?7) 
and eight target variables (Y € R"*®). Thus, the ijn row of X is Xi) = {%1, +, X32} (i) whereas the 


i row of Y is {P,, Py, P3, P,, Ny, N2,N3,Na} 9. The prediction task is to predict Yew) € R® 
given a new query of X(new) € R°*. Fig. 5 summarizes general procedures of initial setup, 
training, prediction, and visualization. We will elaborate on each sub-task as follows. 


3.2.1. Initial preparation 


Task | in Fig. 5 summarizes the key procedure before launching multiple target ML. Ranges of 
variables in the hybrid database are wide, e.g., ranging from 0.01 to 2.23x10”. To be consistent 
and prevent any unit-dependent effect in PCTs, we normalized all attributes to the range of [0, 
1]. We considered two normalization schemes: “min-max” and “standard deviation” 
normalizations as candidates. In the min-max normalization, normalization is done by 


Xi-Xmi 
x] a i~Xmin (7) 
Xmax—Xmin 


where Xmin and Xmqx are the minimum and maximum of the i, attribute, respectively. In the 
standard deviation normalization, we have 


xj = (8) 


where x and s is the mean and the standard deviation of the i, attribute, respectively. To 
quantitatively compare impacts of the normalization schemes, we compare multi-target 
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predictions of three cases: using (1) the original database without any normalization, (2) database 
normalized by the min-max scheme, and (3) database normalized by the standard deviation. All 
the initial settings of the MTRM model are constrained identical for three cases. From this 
preliminary comparison, the “min-max normalization” appears to lead to the lowest MAE. 


Task 1: Preparation Task 3: Visualization 
Task 2: Prediction 

(a) Normalize all variables (a) Backward mapping of 

to [0, 1]; (a) Perform multi-target predicted coef. of 


ML using Clus polynomial bases 
(b) Record Dy 


(b) Calculate MAE wy. (b) Reconstruct F-D curves 
(c) Shuffle data to 70% 


training and 30% test data (c) Confidence interval 


Fig. 4. Multi-target prediction flowchart from initial preparation, training and prediction, and 
postprocessing and investigation. (Dy: Mahalanobis Distance; MAE,yg: the averaged mean absolute error 
of the multiple-target prediction). 


Hence, this study adopts the “min-max” normalization throughout the following procedures. 
Xmax and Xmin of each attribute must be stored for future backward mapping (i1.e., from the 
normalized target to actual response, Task 2 (b) of Fig. 5). 


Although our hybrid dataset has more than 200 instances, it is still relatively small for reliable 
ML training. The PCTs may not be stable to learn the rules around the outside borders of 
multiple descriptive variables. Such an issue is the so-called “extrapolation” problem, an intrinsic 
statistical model. In short, a statistical or ML model can predict well when the new instance is 
similar to those inside the data space. Still, its accuracy decreases as the new instance is near the 
borderlines or beyond the data space. In those ranges, prediction becomes an extrapolation since 
similar cases have never been experienced [34]. Therefore, it is important to understand each 
instance’s relative location in the entire data space. In addition, it is instructive to note that the 
data space covered by the database is scattered and refers to space with more than one instance 
experienced inside. In the hope of quantitatively determining the borderlines of scattered data 
space and facilitating visualization of the relative position of new instances in the entire data 
space, we adopted the Mahalanobis Distance (denoted as Dy). For a data point in the 
multidimensional space, Dy measures how many standard deviations away the point is from the 
mean of the multidimensional space by 


Du(x) = ¥%—pTS*(x—- p) (9) 


where x is an instance in the descriptive data space (here x = {x,, Xp, ...,%32}"), wis a vector of 
the mean of each descriptive variable (here, pw = {iy, Mp, --, U32}") and S is the covariance 
matrix. We calculate and record Dy into the database as auxiliary information (Task 1 (b) of Fig. 
5). This information determines whether new data is inside the database space or close to or 
beyond the existing database. To facilitate the unbiased training of PCTs, we randomly shuffled 
the database to make 70% training data and 30% test data (Task 1 (c) of Fig. 5). 
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3.2.2. Training and test of the multi-target prediction model 


As shown in Task 2 of Fig. 5, the next step is to train and perform the multi-target prediction. 
PCTs generate two types of prediction results: original predictions and pruned predictions. A 
very large PCTs is grown, which typically learns the details and noises in the training data to the 
extent that it will negatively influence the performance of the model on new instances. The PCTs 
are pruned by one of the pruned criteria to eliminate the negative impact. Here, only original 
predictions are considered in this paper because the pruned prediction is only necessary for the 
very large data set [16]. Random forests are used as an ensemble learning method. Among many 
measurements of prediction accuracy in the ML domains, we adopted the mean absolute error 
(MAE). Since we are predicting multiple targets, each target has its own MAE by: 


100 


Aicjy—Picj 
= n ign = * iQ) 
MAE, = ~ 37, 


Ai(j) 


(10) 


where MAE; is the MAE of the i, target, Ajct) and Pj(j) is the true value and predicted value of 
the in target of the jm instance, respectively. n is the total number of instances. Then, the overall 
MAE of all target attributes (denoted as MAEgyq) is calculated as: 


1 
MAE ang = 7 MAE; (11) 
where g is the number of total target attributes. In this study, g = 8 (see Task 2 (b) of Fig. 5). 


3.2.3. Visualization of prediction mode 


Task 3 of Fig. 5 summarizes the postprocessing. Since our target is to predict curves (not a 
simple scalar), we reconstruct the capacity curves using the predicted coefficients of the 
polynomial bases. It starts from the backward mapping of the coefficients from [0, 1] to the 
original ranges. Given the predicted matrix Yp7eq € R"*® with each entity ranging [0, 1], a batch 
backward mapping is simply given by 


Yrinal = YpreaY aif f + Yimin (12) 


where Ypreq € R"*® is the final predicted coefficient matrix with original ranges. Yq; fre iRe*8 
is a diagonal matrix and Y,,;, € R"*® is a column-size identical matrix, which is given by 


(max(y,) — min(y;)) [0] 
Yair f = se 
[0] (max(yg) — min(yg)) 
(min(y;)) + (min(yg)) 
Ynin = || *. || 
(min(y;)) + (min(yg)) 


Here y; € R"** represents a vector of original ij, target coefficient. Since we now have all 
coefficients of the polynomial bases, we can draw the envelopes of the capacity curves by using 


Eq. (6). 
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3.2.4. Confidence interval 


As all statistical models involve uncertainty, our multiple-target prediction model naturally 
exhibits uncertainty for new predictions. For a new prediction, it is crucial to provide uncertainty 
that is rooted in the training process that uses randomly selected training data sets. To offer a 
measurement of uncertainty behind ML-based prediction, this study harnesses a bootstrapping 
[34] similar to the so-called “percentile bootstrapping.” The detailed procedure to obtain 
bootstrapping sample is as follows. 


[BS 0] Initial stage begins with a training data set Mjj=1) and a new instance Xe, € R°**? 


[BS 1] Fit a multiple-target prediction model using the training data set Mi) and obtain a target 
T 
TESPONSE Yew i) = {P,, P, P3, Py, Ny, Nz, N3, Nahiy for the given Xpey. 


[BS 2] Generate a new training data set Mj.) by resampling 70% of the database (randomly 
selected with replacement). 


[BS 3] Refit the multiple-target prediction model using the training dataset M(j41). 


[BS 4] Repeat above steps (1-3) 7p; times to generate np; bootstrapping samples (1.¢., 1); multi- 
target predictions). 


In our approach, sorting the n,, multi-target predictions is necessary, but it is not straightforward 
as a single target bootstrapping. To derive a physically sound approach for sorting the 7», 
multivariate predictions, we focused on the absorbed energy of the structure, i.e., area under the 
capacity curves. In general, a peak-based sorting appears not reasonable: e.g., curve (c) has the 
largest positive peak while curve (a) has the largest peak in the negative regime in Fig. 6. 
However, the total absorbed energy intuitively leads to a single scalar that also holds the 
mechanical meaning of the structure. Fig. 6 briefly illustrates how the capacity curves' absorbed 
energy is calculated and how it can help order the three dissimilar curves of different peaks and 
shapes. Since we represent the capacity curve envelopes with polynomial bases and already 
obtained their real-valued coefficients in Yrew jy, = 1,...,Nps (BS 2), it is straightforward to 
calculate the absorbed energy (denoted as I(;) € IR*) as 

la = [SpreO Ho) has Sty + HS) Brea MS ew (13) 
where the subscript (i) denotes the i, multi-target prediction; |.| returns the absolute value; H(d) 
is the unit step function (i.e., 1.0 for d > 0, zero otherwise); Dax i) and Dmin,j 18 the positive 
maximum and negative minimum displacement of the capacity curve, respectively; ¢(;) is the 
displacement coordinate. The condition that the cumulative distribution of bootstrap samples 
(denoted as G) is less than or equal to a constant b is expressed as: 


G(b) = F{lg) <b },i =1,..., Nps. (14) 


where F is the frequencies of I(;). An instance with a specific percentile (a) is represented as: 
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y= Ga) (15) 


fa hs 


Dain, (i) i 
—>d(i) 
i Det 


®- [i of curve (a) 


Fig. 5. Illustration of calculation of the absorbed energy used for sorting in bootstrapping. Three capacity 
curves (a,b,c) with different peaks and shapes are shown. ( means a summation operation). 
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Fig. 7. 95% percentile confidence interval of wall WSH3 under 590 MPa shear strength. 


where G~1 is the inverse function of G. Therefore, the 95% percentile confidence interval is 
given by 


(7°(0.028), y*(0.975)) (16) 


In this paper, ny, = 100 is adopted. Here, a 95% confidence interval indicates the probability of 
the range covering the predicted curves regarding the total absorbed energy. For instance, Fig. 7 
shows a 95% percentile confidence interval. Note that there is ample room for extension of the 
proposed approach, especially regarding how to define the “order” of the bootstrapped samples. 
Also, there are other methods for uncertainty quantification, such as a Jackknife method [35], 
which is straightforward and does not require a random sampling. 
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4. Results 


4.1. Impact of PCT types on prediction accuracy 


To investigate the impact of PCT types on the performance of MTRM, we considered two types 
of operational settings of PCTs. The first type, conventional PCTs, considers both descriptive 
variables X and target attributes Y to partition instances into subsets during searching. On the 
contrary, the second type, the so-called trial PCTs, partitions instances into subsets in terms of 
only the descriptive variables X. For comparison, we used identical training and test data from 
the capacity curve database (182 rows) to train the model and make predictions. As already 
mentioned in Task | (b) in Fig. 5, Dy of all instances are recorded to easily visualize each 
specimen’s relative position in the multivariate space and are plotted in a radar plot (e.g., Fig.8). 
The detailed values of the created database and Dy of all instances are available in [29]. Four 
selected walls (indexed by 4, 20, 67, and 88) and their Dy values are presented in Fig. 8. Fig. 9 
presents the predicted capacity curves of selected walls accordingly. The corresponding MAEs of 
these four capacity curves predicted by the conventional PCTs and trial PCTs are aggregated in 
Table 5. 


Table 5 
MAEs of prediction by conventional PCTs versus trial PCTs. 
Wall Index Du MAEs (conventional) MAEs (Trial) 
4 17.9 2.6% 4% 
20 16.8 2.3% 18% 
67 0.62 1.1% 1.2% 
88 1.79 0.1% 0.1% 


Another prediction of wall 175 with Dy = 1.89 is plotted in Fig.10, which also supports the good 
prediction of both PCTs with smaller Dy. The prediction accuracy of conventional PCTs is much 
stable and superior to the trial PCTs. In addition, it is observed that both conventional PCTs and 
trial PCTs make a relatively accurate prediction of wall index 67 and 88, but a decent prediction 
of wall index 4 and 20. To some extent, the trial PCTs is similar to the “clustering” since it 
considers only the descriptive attributes. On the contrary, the conventional PCTs collaborate with 
the rule ensemble to better interpret and explore complex data. In view of the high 
dimensionality of our database (i.e., 32 variables), the conventional PCTs appear to slightly 
outperform the trial PCTs. Based on this outcome, the conventional PCTs were utilized in all the 
simulations hereafter. 


4.2. Impact of the extended database on the prediction 


The discussion addressed so far is inherently based on the training data. It is common sense that 
PCTs will yield better predictions when a target instance resides within the boundary of the 
available training data. The prediction model will perform the so-called “extrapolation” when a 
new target has little similarity and falls outside the existing training data. To investigate the 
influence of this extrapolation, we first trained the PCTs with 70% of sampled training data from 
the capacity curve database (182 rows) and made the prediction for the 30% test data plus a new 
instance (SW1-2) inclusively involved. The Dy of SW1-2 along with other 182 instances are 
visualized in Fig. 11 (a). The Dy of SW1-2 (marked as a star) indicates exclusion of the new 
instance in contrast with the existing training data space. And the predicted capacity curve of 
SW1-2 is visualized in Fig.12 as the dashed curve. Secondly, we collected 33 new rectangular 
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shear walls from [36] and merged them into the capacity curve database, enlarging them to 214 
rows. Repeat the scenario by training the PCTs with 70% of sampled training data from the 
extended capacity curve database (214 rows), and predict the rest of the test data. 


1 
5177 179181 
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Fig. 8. Radar plot of 182 walls with varying Dy: (a) wall 4 with Dy = 17.99; (b) wall 20 with Dy = 16.76; 
(c) wall 67 with Dy = 0.62; (d) wall 88 with Dy= 1.79. 
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Fig. 9. Predicted capacity curves using the conventional PCTs and the trial PCTs: (a) wall 4; (b) wall 20; 
(c) wall 67; (d) wall 88. 
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Fig. 10. Predicted capacity curves of wall 175 using the conventional PCTs and the trial PCTs. 


It is critical to note that we force SW1-2 as one of the test data for both scenarios for comparison. 
The predicted capacity curve of SW1-2 is visualized as the dashed curve delimited by dots in 
Fig. 12. For the first scenario, it is observed that the prediction of SW1-2 under the original 
database diverges from the experimental F-D of SW1-2 in Fig. 12. And the data space of SW1-2 
marked as star indicates exclusion of the new instance in contrast with the existing training data 
space in Fig. 11 (a). For the second scenario, we found that the prediction of SW1-2 under the 
extended database converges significantly in contrast with the prediction of the previous 
scenario. The ample data space around SW 1-2 in Fig. 11 (a) has been compacted asymptotically 
with multiple samples around in Fig. 11 (b), which presents that the extension of 33 new shear 
walls has high similarity with specimen SW1-2 in terms of Dy. Analyzing the results of both 
scenarios, the extension of the database, which includes instances of high similarity with SW1-2 
in terms of Dy, will positively influence the prediction. These similar instances will fill in 
scattered data space around SW1-2 and lead to a more comprehensive model. On the contrary, an 
extension of the database of low similarity with SW1-2 will rarely promote the prediction of 
SW1-2. 


4.3. Impact of erroneous data on prediction 


Nowadays, a tremendous amount of engineering data accumulate in our domain, frequently 
reported with incompleteness and erroneous values. The deficiency of data will not facilitate 
training of the predictive models but leave the potential risk of generating an unstable prediction. 
One possible way to minimize the negative influence of data deficiency on prediction is to 
leverage imputation to handle missing values. The impacts of the existing imputation method 
fractional hot deck imputation on the prediction of engineering data have been investigated by 
[37]. The robustness of the MTRM against erroneous data is one of the most important criteria to 
evaluate the model objectively. Note that the naive version of the capacity curve database (182 
rows) has 2.3% erroneous values within the descriptive variable matrix X because of human 
errors. Fortunately, the author was aware of these errors ahead of time and remedied the capacity 
curve database with extreme caution. To investigate the impact of erroneous data on the 
prediction, the author utilized 30% of sampled erroneous data to train the conventional PCTs 
and generate predictions for wall indexed by 20 and 88, respectively. (the identical walls denoted 
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by (b) and (d) in Fig. 8). The predicted F-D curves of these two targets are visualized in Fig. 13 
as dash curves in contrast with predicted capacity curves upon the correct database (dash curves 
delimited with dots). Fig. 13 infers that the conventional PCTs are fairly robust against erroneous 
data. 
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Fig. 11. (a) Dy, of 183 wall instances (b) Dy, of 214 wall instances. Note that Dy of wall SW1-2 is marked 
with a star. 
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Fig. 12. The predicted capacity curves of SW1-2 based on the original database and the extended 
database, respectively. 
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Fig. 13. The predicted capacity curves based on erroneous database versus that based on the correct 
database: (a) wall 20; (b) wall 88. 


5. Conclusions 


In the hope of providing an efficient and reliable tool that can help quickly determine the 
capacity curves of F-D responses, this paper utilizes a multi-target regression model to generate 
the prediction. To our best knowledge, the prediction of capacity curves had never been 
attempted in infrastructure engineering. The general conclusion is that the MTRM implementing 
conventional PCTs combined with ensemble methods generates fairly good predicted F-D curves 
in terms of MAE and visualization. Its confidence interval and robustness against erroneous data 
strengthen the reliability of the method. Compared with the traditional approach to conducting a 
real experiment or simulating finite element models, the proposed method of incorporating ML 
will significantly reduce expenses in terms of time and money. 


The future works will focus on several interesting aspects which will promote the performance of 
the method. Firstly, the university of the proposed capacity curve database is restricted to 
rectangular shear walls. The extension of the method to other types of infrastructures will break 
the bottleneck of the proposed approach. Secondly, the capacity curve database consists of 32 
descriptive variables currently, which may cause overfitting issues. An attributes selection test 
based on empirical engineering knowledge or the attributes selection algorithm [34] may 
improve the precision of the prediction. Lastly, concerning the size of the proposed capacity 
database, it may result in unstable and biased models. A sufficiently large database extended in 
the future will help to produce more accurate and stable results. 
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Appendix A: Example of input files 


The user must provide three input files to run the Clus. The training data file should strictly 
follow the format: 


@RELATION “WallDB_ train” 

@ATTRIBUTE var numeric 
@ATTRIBUTE var2 numeric 
@ATTRIBUTE aT dante 
@DATA 


0,0,0,0,1,0.25,0,0.256666667,0,0,0.048,0,0.226190476,5.07E-08, 1,1,0.155506608,0, 


The Clus is format-sensitive. The exact name of the training data file should be included at the 
beginning. Afterward, users have to list all attributes along with data types. Note that training 
data must be listed row-wise. A comma delimits each element in a row, and each row is delimited 
by starting a new line. The test data file follows an identical format. Besides, a file specifying all 
the parameter settings is described as: 


[Attributes] [Ensemble] 

Target = 33-40 Iterations = 100 

Clustering = 1-40 EnsembleMethod = RForest 
Descriptive = 1-32 

[Data] [Output] 

File = WallDB train. arff WritePredictions = {Train,Test} 


TestSet = WallDB test.arff 


[Tree] 

Heuristic = VarianceReduction 
PruningMethod = M5Multi 
ConvertToRules = ALLNodes 


Users can control the types of PCTs in Attributes section. Data section lists the full name with the 
extension of training data and test data. Tree and Ensemble sections specify additional settings 
for the PCTs. For more details, Clus manual provides comprehensive explanations for each item 
in these three files. 
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Appendix B: Graphic PCTs 
This appendix explains a toy graphic PCTs after rule ensemble in Fig. 1 (b). An incomplete 


realistic graphic PCTs for predictions in Figs. 9 and 10 is obtained from the output file of Clus: 


Original Model 


retet ett i tates 


varl > 0.0 
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+--yes: varl4 > 0.0 

I +--yes: varlé > 0.569029851 

I | +--yes: var? > 0.545454545 

| I | +--yes: var6é > 0.5 

t | | | +--yes; [0.307979,0.527636,0.6267,0.962626,0.756192,0.477687,0,0.26211]: 3 

t ' | | +--no: «varé > 0.276666667 

' i | | +-~yes: (0.240821, 0.661636, 0.422535, 0.971888, 0.796779, 0.588153,0,0.400133}: 2 

' I ! | +--no: varé > 0.125 

I | | | #--yes: varl3 > 0.80952381 

t | ! | +--yes: varl7 > 0.086105727 

l I | | I +-~yes: [0.253708,0. eee 0.479085, 0.967333, 0.789692,0.563855,0,0.309887]: 2 

| | | | ! +--no: [0.259163,0.624,0.489324,0. aceon: 0.786S,0.5$7831,0,0. 301482): 3 

t I | | 1 +--no: ty Menvok Licombia #. FaeeiL UibeaTIN UN TUdDIA Ws Gelecs Gea eevee 2 

' ! | | #--no: (0.253731, 0.630737, 0.497282, 0.961661, 0.788452,0.559556,0,0.199005): 2 

! ! | +--no: var6 > 0.5 

t i | +-~yes: (0.698066, 0.250162, 0.568923, 0.990174,0.310385,0.231727,0,0.681221}: 2 

' I | t--no: var8 > 0.35 

t I ! +--yes: (0.210789, 0.726788, 0.310329, 0.977614, 0.816987, 0.652209,0,0.509401): 3 

' ' | +--no: «varé > 0.1425 

! I | +--yes: (0.199034,0.76,0.219718,0.989024,0.834487,0.704026,0,0.757133): 3 

' | | +--no:) -varl7 > 0.220264317 

| | | #--yes: varl7 > 0.308370044 

' | | | +--yes: [0.214614,0.718727,0.321127,0.979046,0.81375,0.643574,0,0.551427]: 2 

| | | ' +--no: = (0.220048, 0.708182, 0.337324, 0.977094, 0.809904, 0.633936, 0,0.510949]: 

' | | +--no: var® > 0.166666667 

| | | +--yes: varl3 > 0.619047619 

' I | I +-~yes: varl7 > 0.132158S9 

| ! | ' | +--yes: [0.219722,0.7079,0.345352, 0.974588, 0.807923,0.6249,0,0.429993): 4 
' | | | | +--no: [0.197343,0.7S3816, 0.267606, 0.979599, 0.808654, 0.628313,0,0.44791): 
' ! | ! +--no: [0.203019,0.741636,0.288732,0.97839S, 0.809712,0.631325,0,0.457863): 2 

t ' | +--no: (0.227295, 0.694182, 0.362676, 0.974393,0.81128,0.635944,0,0.471798]: 2 

| | +--no: varl4 > 1.3E-7 

I | +--yes: varé > 0.306666667 

| | | +--yes: var& > 0.5 

! | | | +--yes: var® > 0.666666667 

t ' | ! 1 +--yes: (0.256256, 0.633691,0.458451,0.972506, 0.80174, 0.595602,0,0.451228}: 2 

' i | | | #--no: «varé > 0.375 

' | | | 1 +--yes: [0.271884,0.599162,0.512254,0.971107, 0.775828, 0.529707,0,0.374253): 2 

' ' | | i #--no: «varl7 > 0.176211454 

I | | | i +<<yes: var6 > 0.125 

! | | 1 ! ! +--yes: varl7 > 0.264317161 

' | | ! | ' | t-~yes: [0.259529,0.623364,0.48 - 970681, 0.782962, 0.549538, 0,0.388852]: 
t i | ! ' ' | +--no: varl3 > 0.833333333 

' | | | 1 ' ' +--yes: [0.259227,0.624994, 0.480376, 0.970348, 0.784808,0.555408, year 
! i | ! 1 | | +--no: [0.259541, 0.624182, 0.48331,0.969643, 0.785067,0. peel ie 37 
' I | | i t +--no: [0.259179,0.624709,0.490845,0.965576, 0.784036, 0.553434,0,0.28069]: 2 

' | | | 1 #--no: =[0.257907,0.628109, 0.478638, 0.968677,0 ater 0.558701,0,0.35324]: 3 

! | | | 4--no: [0.26 +617218, 0.500211, 0.96704,0.723106,0.55006,0.5,0.318514): 


We upload the output file of nredivtionda in Figs. 9 and 10 with full erphie PCTs in [29]. 


Appendix C: Attributes details of the capacity curve database 


Attributes Detail 

I Moment of inertia 
Length Length of shear wall 
Thickness Thickness of shear wall 
Height Height of shear wall 


Number of floors 
Axial Force Ratio 
Cover thickness 
Concrete_fc 
Concrete_ft 

bb 

hb 

cb 
Steel_Verticall fy 


Number of floors 

Axial force ratio 

Cover thickness 

Concrete compressive strength 

Concrete yield strength 

Width of boundary element 

Thickness of boundary element 

Cover thickness in boundary element 

Yield strength of boundary longitudinal reinforcement 
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Steel_Verticall fu 
Steel_Verticall Spacing 
Steel_Verticall_ strain at fu 
Steel_Verticall_ Diameter 
Steel_ Vertical2_ fy 
Steel_Vertical2 fu 
Steel_Vertical2_ Diameter 
Steel_Horizontall_ fy 
Steel_Horizontall_fu 
Steel_Horizontal1_ strain at fu 
Steel_Horizontall_ Spacing 
Steel_Horizontall_ Diameter 
Steel_Stirrup1_ fy 

Steel Stirrup! fu 
Steel_Stirrup1_ strain at fu 
Steel_Stirrup1_ spacing 
Steel_ Stirrup 1_ Diameter 


Number of longitudinal bars at wall boundary 


P, ~ P,andN, ~N, 


Ultimate stress of boundary longitudinal reinforcement 
Spacing of boundary longitudinal reinforcement 
Ultimate strain of boundary longitudinal reinforcement 
Diameter of boundary longitudinal reinforcement 
Yielding strength of web longitudinal reinforcement 
Ultimate stress of web longitudinal reinforcement 
Diameter of web longitudinal reinforcement 

Yielding strength of boundary transverse reinforcement 
Ultimate stress of boundary transverse reinforcement 
Ultimate strain of boundary transverse reinforcement 
Spacing of boundary transverse reinforcement 
Diameter of boundary transverse reinforcement 
Yielding strength of stirrups 

Ultimate stress of stirrups 

Ultimate strain of stirrups 

Spacing of stirrups 

Diameter of stirrups 

Number of longitudinal bars at wall boundary 
Polynomial bases parameters 
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